Talking Head Generation


Talking head generation is the process of generating videos of a person speaking based on an audio recording of their voice.

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Add code
Oct 17, 2024
Figure 1 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 2 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 3 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 4 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Viaarxiv icon

Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads

Add code
Oct 14, 2024
Viaarxiv icon

MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting

Add code
Oct 14, 2024
Figure 1 for MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
Figure 2 for MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
Figure 3 for MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
Figure 4 for MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
Viaarxiv icon

MMHead: Towards Fine-grained Multi-modal 3D Facial Animation

Add code
Oct 10, 2024
Figure 1 for MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
Figure 2 for MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
Figure 3 for MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
Figure 4 for MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
Viaarxiv icon

Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation

Add code
Sep 29, 2024
Viaarxiv icon

LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details

Add code
Oct 01, 2024
Figure 1 for LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Figure 2 for LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Figure 3 for LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Figure 4 for LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Viaarxiv icon

DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis

Add code
Sep 16, 2024
Viaarxiv icon

EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion

Add code
Sep 11, 2024
Figure 1 for EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Figure 2 for EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Figure 3 for EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Figure 4 for EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Viaarxiv icon

StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads

Add code
Sep 14, 2024
Figure 1 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 2 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 3 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 4 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Viaarxiv icon

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures

Add code
Sep 11, 2024
Figure 1 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 2 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 3 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 4 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Viaarxiv icon